Overview
This covers the basic hyperparameter tuning of the models. In the
cases where there are several hyperparameters, like for Random Forest
and Gradient Boosting Tree, further tuning is required to ensure we have
found the near optimal hyperparameters. It would not be computationally
sensible to exhaustively search the hyperparameter space of the ensemble
techniques. Instead the tuning could be done in two phases. The first
phase would be more general and broad while the final phase would be
search a finer scope of values, using the insights from the previous
phase. This document covers the final phase.

DUM - Dummy
ERF - ExtraRandomForest
GNB - GaussianNaiveBayes
GB - GradientBoosting
KNN - KNearestNeighbours
LSVM - LinearSVC
LG - LogisticRegression
RF - RandomForest
SVM - SupportVectorMachine
Logistic
Regression
Logistic Regression (LG) was trained across three different
hyperparemeters, each relating to regularisation.
- C, is the strength of regularisation, with larger
values indicating a smaller regularisation effect.
- Penalty refers to the four type of regularisation
tested, from None (No regularisation), L1, L2 and ElasticNet (L1 +
L2).
- L1 ratio ratio between L1 and L2 (only applicable
for ElasticNet)
C

Penalty

L1 Ratio

Best Hyperparameters
for each Metric
K-Nearest
Neighbours
K-Nearest Neighbours (KNN) was trained across only two
hyperparameters:
- K (number of neighbours) the number of nieghbouring
points involved in the calculation
- Weight, how the proxomity of these neighbouring
points are weighted (Uniform or proportional to Distance).

Best Hyperparameters
for each Metric
Support
Vector Machine
A variety of Support Vector Machines (SVM) were trained on three
hyperparemeters:
- C, like in logistic regression, this is the
regularisation factor (smaller values means larger penalty)
- Kernel, the core of the SVM either Linear,
Polynomial or a Radial Basis Function (RBF)
- gamma, determines the influence of a single data
point. Larger values mean the points need to be closer to influence each
other.
- Class Weight, the weight for the individual C
values - one for each class. Default is 1, “Balanced” is inversely
proportional to class frequency.
Note: Gamma only applies to a polynomial or RBF kernel, and
scikit-learn offers two methods of determining the appropriate value
through “Auto” and “Scale” - the maths behind the scenes should be
described in the future.
C

Kernel

Gamma

Class Weight

Best Hyperparameters
for Each Metric
Linear
Support Vector Machine*
*Scikit-learn has two APIs for a SVM, the latter only supports a
linear kernel but offers more methods of regularisation. It is also
reported to be significantly quicker - by an order of magnitude - for
larger datasets.
A Linear SVM (LSVM) does not take in hyperparameters kernel or gamma,
since it is constrained to a linear kernel which does not have a gamma
parameter. However, due to various technical issues, the penalty used
was only L2 instead of including L1 and ElasticNet. The hyperparameters
used were:
- C, the scale of regularisation, larger values
indicate smaller penalties.
- Loss, how an error is determined, Hinge or Squared
Hinge.
- Class Weight, the weight for the individual C
values - one for each class. Default is 1, “Balanced” is inversely
proportional to class frequency.
C

Loss

Best Hyperparameter
for Each Metric
Random
Forest
Random Forest (RF) is an ensembl technique trains several decision
trees and aggregates across them to form a stronger predictor. RF has
several hyperparameters to test, not all the parameters were selected as
they are not all equally important. The selected few were:
- Number of trees, this is simply the number of trees
trained.
- Max Depth, a common parameter to regularise a
decision tree, to control how many layers it travels down before
terminating.
- Minimum Samples in a Split, the smaller number of
samples allowed when splitting a node.
- Minimum Samples for a leaf node, the minimum
allowed number of samples before a leaf node is created.
- Max Number of Features, determines how many
features to use
- Criteron, Gini impurity or Shannon Entropy.
Determines the best feature(s) to split the node.
- Class Weight, the weight for the individual C
values - one for each class. Default is 1, “Balanced” is inversely
proportional to class frequency.
Number of Trees

Max Depth

Min Samples
Splits

Min Samples Leaf

Max Number of
Features

Criterion

Class Weight
